Policy Gradient Method for Team Markov Games
نویسنده
چکیده
The main aim of this paper is to extend the single-agent policy gradient method for multiagent domains where all agents share the same utility function. We formulate these team problems as Markov games endowed with the asymmetric equilibrium concept and based on this formulation, we provide a direct policy gradient learning method. In addition, we test the proposed method with a small example problem.
منابع مشابه
Learning of Soccer Player Agents Using a Policy Gradient Method: Pass Selection
This research develops a learning method for the pass selection problem of midfielders in RoboCup Soccer Simulation games. A policy gradient method is applied as a learning method to solve this problem because it can easily represent the various heuristics of pass selection in a policy function. We implement the learning function in the midfielders’ programs of a well-known team, UvA Trilearn B...
متن کاملRational and Convergent Model-Free Adaptive Learning for Team Markov Games1
In this paper, we address multi-agent decision problems where all agents share a common goal. This class of problems is suitably modeled using finite-state Markov games with identical interests. We tackle the problem of coordination and contribute a new algorithm, coordinated Qlearning (CQL). CQL combines Q-learning with biased adaptive play, a coordination mechanism based on the principle of f...
متن کاملLearning to Cooperate via Policy Search
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environm...
متن کاملUvA Rescue Team Description Paper Agent competition Rescue Simulation League RoboCup 2014 - João Pessoa - Brazil
The contribution of the UvA Rescue Team is an attempt to lay a theoretical foundation by describing the planning and coordination problem formally as an POMDP problem, which will allow to apply POMDP-solution methods in this application area. To be able to solve the POMDP problem for large state spaces and long planning histories, our team has chosen for an approximation of the Dec-POMDPs throu...
متن کاملPolicy-Gradient Algorithms for Partially Observable Markov Decision Processes
Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches....
متن کامل